Risk - sensitive reinforcement learning algorithms with generalized average criterion 风险敏感度激励学习的广义平均算法
A reinforcement learning algorithm based on process reward and prioritized sweeping is presented as interference solving strategy 本文提出了基于过程奖赏和优先扫除的强化学习算法作为多机器人系统的冲突消解策略。
( 4 ) a new cooperation model called macm is presentd and based on this model , an improved distributed reinforcement learning algorithm is also proposed ( 4 )提出一种新的多agent协作模型macm及一种改进的分布式强化学习算法。
In the first chapter of this paper , a comprehensive survey on the research of reinforcement learning algorithms , theory and applications is provided . the recent developments and future directions for mobile robot navigation are also discussed 本文的第一章对增强学习理论、算法和应用研究的发展情况进行了全面深入的综述评论,同时分析了移动机器人导航控制的研究现状和发展趋势。
Reinforcement learning has been applied to single agent environment successfully . due to the theoretical limitation that it assumes that an environment is markovian , traditional reinforcement learning algorithms cannot be applied directly to multi - agent system 由于强化学习理论的限制,在多智能体系统中马尔科夫过程模型不再适用,因此不能把强化学习直接用于多智能体的协作学习问题。